15 - Deep Learning - Plain Version 2020 [ID:21066]

50 von 140 angezeigt

Welcome back to deep learning. Today we want to continue talking about convolutional neural

networks. What we really want to see in this lecture are the building blocks towards building

deep neural networks. So what we will learn about today are convolutional neural networks.

This is one of the most important building blocks of deep networks. So far we had those

fully connected layers where each input is connected to each node. This is very powerful

because it can represent any kind of linear relationship between the inputs. Especially

between every layer we have one matrix multiplication. This essentially means that from one layer

to another layer we can have an entire change of representation. It also means that we have

a lot of connections. So let's think about images, videos, sounds and machine learning.

Again this is a bit of a disadvantage because they typically have huge input sizes. You

want to think about how to deal with those large input sizes. Let's say we assume we

have an image with 512 x 512 pixels. That means that one hidden layer with 8 neurons

has already 512 to the power of 2 plus 1 for the bias times 8 trainable weights. That's

more than 2 million trainable weights. Just for a single hidden layer. Of course this

is not the way to go and size is really a problem. But there's more to that. So let's

say we want to classify between a cat and a dog. If you look at those two images here

you can see that a large part of these images they just contain empty areas. So they are

not very relevant. Pixels in general are very bad features. They are highly correlated,

scale dependent and have intensity variation. So there is a huge problem and pixels are

a bad representation from a machine learning point of view. You want to create something

that is more abstract and summarizing the information better. So the question is can

we find a better representation? We have a certain degree of locality of course in an

image. So we can try to find the same macro features at different locations and then reuse

them. Ideally we want to construct something like a hierarchy of features where we have

edges and corners that then form eyes. Then we have eyes, nose and ears that form a face

and then face, body and legs will finally compose an animal. So composition matters

and if you can learn a better representation then you can also classify better. This is

really the key and what we often see in the convolutional neural networks is that you

find very simple descriptors on the early layers. Then in the intermediate layers you

will find more abstract representations. Here we find eyes, noses and so on. In the higher

layers you then really find receptors for example here faces. So we want to have a local

sensitivity but then we want to scale them over the entire network in order to also model

these layers of abstraction. We can do that by using convolutions in the neural network.

So here is generally the idea of these architectures. Instead of fully connecting everything with

everything they use a so called receptive field for every neuron that is like a filter

kernel. Then they compute the same weights over the entire image essentially a convolution

and produce different so called feature maps. Next the feature maps go to a pooling layer.

The pooling then tries to bring in the abstraction and the demagnification in the image. In the

following we then can do again convolution and pooling and go into the next stage. You

can do this until you have some abstract representation and the abstract representation is then fed

to a fully connected layer. This fully connected layer in the end maps to the final classes

which are then car, truck, van and the like. So this is the classification result. So we

need convolutional layers, activation functions and pooling to get the abstraction and to

reduce the dimensionality. In the last layers we find fully connected ones for classification.

So let's start with the convolutional layers. So the idea here is that we want to exploit

the spatial structure by only connecting pixels in a neighborhood. This can then be expressed

in a fully connected layer except if we want to express this in a matrix we could set every

entry in our matrix to zero except the connections that are in the receptive field of the local

filter kernel. So this would mean that we can neglect many connections over spatial

distances. Another trick is that you use filters of size 3 by 3, 5 by 5 and 7 by 7 and you

Teil einer Videoserie :

Deep Learning - Plain Version

Presenters

Prof. Dr.-Ing. Andreas Maier

Zugänglich über

Offener Zugang

Dauer

00:16:08 Min

Aufnahmedatum

2020-10-11

Hochgeladen am

2020-10-11 21:06:19

Sprache

en-US

Deep Learning - Activations, Convolutions, and Pooling Part 3

This video presents convolutional layers including the concepts of strided and dilated convolutions and how to compute their gradient.

For reminders to watch the new video follow on Twitter or LinkedIn.

Further Reading:
A gentle Introduction to Deep Learning

Einbetten

Wordpress FAU Plugin

 https://www.fau.tv/clip/id/21066

iFrame

<iframe src="https://api.video.uni-erlangen.de/services/oembed/?url=https://www.fau.tv/clip/id/21066&format=iframe&maxwidth=1280&maxheight=720" width="1280" height="720"seamless allowfullscreen style="border: 0; padding: 0; margin: 0;overflow: hidden;"></iframe>

Herunterladen

Video

Per RSS abonnieren